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This paper formulates a penalized empirical likelihood (PEL) 
method for inference on the population mean when the dimension 
of the observations may grow faster than the sample size. Asymp- 
totic distributions of the PEL ratio statistic is derived under different 
component- wise dependence structures of the observations, namely, 
(i) non-Ergodic, (ii) long-range dependence and (iii) short-range de- 
pendence. It follows that the limit distribution of the proposed PEL 
ratio statistic can vary widely depending on the correlation struc- 
ture, and it is typically different from the usual chi-squared limit 
of the empirical likelihood ratio statistic in the fixed and finite di- 
mensional case. A unified subsampling based calibration is proposed, 
and its validity is established in all three cases, (i)-(iii). Finite sam- 
ple properties of the method are investigated through a simulation 
study. 

1. Introduction. In a seminal paper, Owen (1988) introduced the empiri- 
cal likelihood (EL) method for statistical inference on population parameters 
in a nonpar ametric framework, and showed that it enjoyed properties similar 
to the likelihood-based inference methods in a more traditional parametric 
framework. Following Owen (1988), the EL method has been extended to 
various complex inference problems; see, for example, Diccicio, Hall and Ro- 
mano (1991), Hah and Chen (1993), Qin and Lawless (1994), Owen (2001), 
Bertail (2006), Hjort, McKeague and Van Keilegom (2009), Chen, Peng and 
Qin (2009) and the references therein. An extension of the EL method in 
the high-dimensional context, where the dimension p of the observations 
increases with the sample size n, is given by Hjort, McKeague and Van Kei- 
legom (2009). Hjort, McKeague and Van Keilegom (2009) derives the limit 
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distribution of the EL ratio statistic based on p-dimensional estimating equa- 
tions when p — 7- oo with n at the rate p = o(n^/^). Chen, Peng and Qin (2009) 
improved upon the rate restriction in Hjort, McKeague and Van Keilegom 
(2009) and estabhshed a nondegenerate hmit distribution of the EL ratio 
statistic, allowing p = o(n^/^) under suitable regularity conditions. 

For applications to high-dimensional problems, such as those involving 
gene expression data, one encounters a p that is typically much larger than 
the sample size n. However, extension of the EL to such high-dimensional 
problems is itself a daunting task because the (standard) EL method is 
known to fail in such situations. An important result of Tsao (2004) shows 
that the definition of the EL for a p- dimensional population mean based on a 
sample size n breaks down on a set of positive probability whenever p > n/2; 
further, this probability is asymptotically nonnegligible. The main reason 
for this surprising behavior of the EL is that for p > n/2, the convex hull of 
n random vectors in is too small a set to contain the true mean with high 
probability. As a result, the standard EL approach cannot be applied to the 
"large p small n" problems with p > n/2. An alternative formulation of the 
EL in such situations (called the adjusted EL) is given by Chen, Variyath 
and Abraham (2008), which is further refined and studied by Emerson and 
Owen (2009). The adjusted EL method adds additional pseudo-observations 
[a single one in Chen, Variyath and Abraham (2008) and two in Emerson 
and Owen (2009)] so as to cover a hypothesized value of the mean param- 
eter within the convex hull of the augmented data set, thereby making the 
adjusted EL well-defined. A second approach, due to Bartolucci (2007), is to 
drop the convex hull constraint in the formulation of the EL altogether and 
redefine the likelihood of a hypothesized value of the parameter by penal- 
izing the unconstrained EL using the Mahalanobis distance. The penalized 
EL (PEL) of Bartolucci (2007) is well defined for all values of p < n, as long 
as the sample covariance matrix is nonsingular. However, due to the use of 
the inverse of the sample covariance matrix in its formulation, the PEL of 
Bartolucci (2007) is also not well defined for p> n. Bartolucci (2007) estab- 
lishes a chi-squared limit of the PEL for the population mean in the case 
where the dimension p is fixed and finite for all n. Other variants of the PEL 
where a penalty function is added to the standard EL, in the spirit of the 
penalized likelihood work of Fan and Li (2001) and Fan and Peng (2004), 
are considered by Otsu (2007) and Tang and Leng (2010). Both these papers 
consider the high dimensional set up and establish validity of their methods 
still requiring p to grow at most as a fractional power of the sample size n. In 
this paper, we introduce a modified version of the PEL method of Bartolucci 
(2007) that is computationally simpler and that is applicable to a large class 
of "large p small n" problems, allowing p to grow faster than n. This is an 
important step in generalizing the EL in high dimensions beyond the p<n 
threshold where the standard EL and its existing variants fail. 
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To briefly describe the proposed methodology and the main results of the 
paper, suppose that Xi,. . . ,X„ are independent and identically distributed 
(i.i.d.) RP-valued random vectors with mean fi £ R^, 1 <p < oo. Denote the 
jth component of a p- vector xhy Xj, j = 1, . . . ,p. The proposed PEL employs 
a multiplicative penalty term to penalize the likelihood of a hypothesized 
value n of the population mean as a quadratic function of the distance be- 
tween the sample mean and However, unlike Bartolucci's (2007) method, 
the use of the inverse sample covariance matrix is completely avoided, as 
consistency of the sample covariance matrix in the high dimensional case 
for all the dependence structures that we consider in this paper is not guar- 
anteed. The proposed PEL instead uses a component-wise scaling to bring 
up the varying degrees of variability (variances) along different components 
to a common level, and then it applies an overall penalty on the sum of 
the squared rescaled differences; see (2.1) in Section 2 below. As a result, 
the proposed PEL is well- defined for all values of n,p > 1. Further, this 
approach has the added advantage that it does not require inversion of a 
high-dimensional matrix, and therefore, it is computationally much simpler. 

For investigations into the theoretical properties of the proposed PEL 
method, we allow the components of Xi to be dependent. The range of 
dependence that we consider covers the cases of: 

(i) short-range dependence (SRD), where roughly speaking, the aver- 
age of the components of Xi satisfies a central limit theorem (CLT) under 
suitable moment conditions; cf. Ibragimov and Linnik (1971); 

(ii) long-range dependence (LRD), where under suitable regularity con- 
ditions, the average of the components satisfies noncentral limit theorems 
[Taqqu (1975, 1977), Dobrushin and Major (1979)]; 

(iii) nonergodicity (NE), where the dependence is so strong that the aver- 
age of the components even fails to satisfy a (strong) law of large numbers. 

We refer to the LRD and SRD cases collectively as the ergodic (E)-case, 
as the negative logarithm of the PEL ratio statistic Kn (say) here satisfies 
a law of large numbers without further centering and scaling, for any rate 
of growth of p] cf. Remark 4.2 below. However, such degenerate limits laws 
are not always the most useful in practice as these only lead to conservative 
large sample inference procedures. By using suitable centering and scaling, 
we are able to further refine these results and establish convergence in dis- 
tribution to nondegenerate limits. Specifically, we show that under SRD, 
Kn with centering at 1 [for c* = 1 in condition (C.2)(ii) below] and scaling 
by square-root of the dimension p of the observations converges to a Nor- 
mal limit, very much like the results of Hjort, McKeague and Van Keilegom 
(2009) and Chen, Peng and Qin (2009), but allowing a much faster rate of 
growth of p and allowing a more general dependence framework. In the long 
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Fig. 1. Envelope for the growth rates of p for a nondegenerate limit law of the PEL ratio 
statistic. The smaller the a, the stronger is the dependence. 



range dependent (also abbreviated as LRD) case, Kn with a suitable nor- 
malization can have both Normal and non-Normal limits. For the Normal 
limit, the centering and the scaling sequences are the same as those used in 
the SRD case, except at the boundary layer of dependence where the Nor- 
mal limit switches over to the non-Normal limit. For the non-Normal limit 
under LRD, the centering term is the same as that in the SRD case, but 
the scaling depends on the rate of decay of the auto-correlation coefficient 
of the components of Xi (up to a possibly unknown permutation). Finally, 
in comparison to the E-case, Kn in the NE-case is shown to converge in 
distribution to a stochastic integral, and it does NOT require any further 
centering and scaling. 

The growth rate of p, for which a nondegenerate limit law holds for a 
suitably transformed Kn, primarily depends on the strength of dependence 
among the components of the observations; cf. Figure 1. In the NE-case, p 
can grow arbitrarily fast (e.g., polynomial, exponential, super-exponential, 
etc.) as a function of n. In the E-case, although a degenerate limit law holds 
for an arbitrary growth rate of p, for a nondegenerate limit, p must admit 
a suitable upper bound. In particular, for the Normal limit in the E-case 
(excluding the boundary case), the growth rate of p as a function of n is 
p = o{n'^). For the non-Normal limit under the E-case, p = o(n^/") for < 
a < 1/2 where, roughly speaking, a denotes the exponent of the rate of decay 
of the autocorrelation among the components of Xi, up to a permutation; cf. 
condition (0.4)^, Section 3. The boundary case is given by a = 1/2, where 
the growth rate is slightly smaller and is given by p = o(n/[logn]^). From 
Figure 1, it follows that the stronger the dependence among the components 
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of the observations, the higher is the ahowable growth rate of p as a function 
of n for a nondegenerate Hmit. The hmiting case a — )• 0+ is the NE-case. 
Here the nondegenerate hmit for the the negative logarithm of the PEL ratio 
statistic holds for an arbitrary growth rate of p as a function of n. 

It is worth pointing out that in most cases, the limit distribution of the 
PEL ratio statistic is not distribution free in the sense that the asymptotic 
approximation to the distribution of the PEL ratio statistic requires the 
knowledge of one or more unknown population parameters. As a result, 
the limit laws are not directly usable in practice. To address this issue, 
we propose a calibration procedure based on the subsampling method. We 
show that under mild conditions, the subsampling based calibration method 
is consistent under all three types of dependence structures. 

The key step in the proofs is to derive a quadratic asymptotic approx- 
imation to Kn under all three cases of dependence. This is presented in 
Lemma 6.2 for the NE-case and in the proof of Theorem 3.2 for the E-case. 
The derivation of the limit law in the NE-case uses some weak convergence 
and operator convergence results on Hilbert spaces. On the other hand, in 
the E-case, refined approximations to Kn are required to go beyond their 
degenerate limits. See Section 6 for more details. 

The rest of the paper is organized as follows. In Section 2, we describe the 
PEL methodology. In Section 3, we introduce the asymptotic framework and 
establish the limit distributions of the logarithm of the PEL ratio statistic 
under the three dependence scenarios. In Section 4, we describe the subsam- 
pling method and prove its validity for all three cases. We report the results 
from a moderately large simulation study in Section 5. Proofs of the results 
are given in Section 6. 

2. Formulation of the PEL. Let Xi, . . . , X„ be i.i.d. random vectors with 
mean fi G W. Let Xij denote the jth component of Xi, 1 <i <n, 1 < j <p- 
Also, let A' denote the transpose of a matrix A. We define the penalized 
empirical likelihood (PEL) of a plausible value fi = {fii, . . . ,^p)' of the pop- 
ulation mean as 

2\ N 



(2.1) Ln{fi)= sup S exp -A^(5j 



{7ri,...,7r„)'Gn 




'^TTi{Xij - flj) 



.1=1 



where II^ = {(vri, . . . , 7r„)' E [0, 1]" : VTj = 1}, Jj's are component specific 

weights (which may be random), and A = A„ G [0,oo) is an overall penalty 
factor. Here we use 

(2.2) Sj = 6nj=S~Jl{Snj^0), 

where s'^j = J2i=ii-^ij ~ ^njY the sample variance of the jth compo- 
nents of Xi, . . . , Xn, Xnj = Z^iLi and where l(-) denotes the indica- 
tor function. This choice of the component-wise scaling allows us to adjust 
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for the heteroscedasticity along different co-ordinates of Xi and therefore, 
the overall penalization parameter A gives comparable weights to all com- 
ponents. In addition, the choice of the penalty function makes the proposed 
PEL invariant with respect to component- wise scaling, which is an inher- 
ently desirable property, particularly while dealing with high-dimensional 
variables, where the assumption of homoscedasticity among a large number 
of components is unrealistic. The maximizer of the product YYi=i (2-1) 
without the penalty term is given by vrj = 1/n for i = 1, . . . ,n. Hence, the 
PEL ratio statistic at a plausible value /i of the mean vector is defined as 

We now compare our formulation with the PEL of Bartolucci (2007), 
which is defined as 

-"""-.■...s.oi(n^)-(- "'°'^'^''°''" )[ 

where v = Y.i=i T^i^i^ = n"^ YA=ii^i ~ ^n){Xi - Xn)' is the sample co- 
variance matrix, and his a penalty parameter. Note that for a large p£ {l,n], 
the sample variance matrix Vn is ill-conditioned, if the smallest eigen-value 
of the underlying covariance matrix S (say) of Xi is not bounded away from 
zero, and it is always singular when p> n, requiring further modifications 
to make (2.3) well defined. Under either of the two scenarios, the PEL based 
on (2.3) can be computationally demanding and unstable. In comparison, 
the component-wise scaling in (2.1) only involves 1-dimensional operations 
which is computationally much simpler and feasible even for a large p. A sec- 
ond limitation of (2.3) is the lack of attractive theoretical properties of 
(or of its variants) in high dimensions. Indeed, consistency of the sample co- 
variance matrix (and its banded or tapered versions) in the high-dimensional 
setting is questionable in presence of strong correlations among the compo- 
nents of Xi that we consider here. Existing work on consistency of the sam- 
ple covariance matrix is known only under suitable conditions of sparsity 
or weak dependence; cf. Bickel and Levina (2008), El Karoui (2008), Cai, 
Zhang and Zhou (2010). Our formulation also avoids this problem altogether 
by using component- wise scaling. 

For the sake of completeness, we also briefly describe the penalized EL 
approach of Otsu (2007) and Tang and Leng (2010), specialized to the 
case of the mean parameter for simplicity of exposition. Let L^{fi) = 

sup{nr=i ^n) G n„, X]r=i "^li^i ~ ^) = Ol denote the standard 

EL for fi. Also, let p\{-) be a penalty function, such as the smoothly clipped 
absolute deviation (SCAD) penalty function of Fan and Li (2001). Then, 
the penalized EL considered by Otsu (2007) and Tang and Leng (2010) is 
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of the form 

(2.4) lOTL(^) ^ ^ST(^) , 

where fi = (/Ui, . . . ,/Up)'. For the case of a more general parameter 6 defined 
through a set of estimating equations, the formulation of Otsu (2007) and 
Tang and Leng (2010) replaces L^{iJ.) in (2.4) by the corresponding version 
of the standard EL for 6; cf. Qin and Lawless (1994). As a result, irrespective 
of the target parameter, since (2.4) is directly based on the standard EL, 
this formulation of the penalized EL also suffers from the same limitations 
as the standard EL. In particular, this approach also fails in high dimensions 
whenever p> n/2. 

In the next section, we investigate theoretical properties of the proposed 
PEL method (2.1) under the dependence structures described in Section 1. 

3. Limit distributions. 

3.1. General framework. We establish the limit distribution theory for 
the PEL ratio statistic in a triangular array set up, with n denoting the 
variable driving the asymptotics. Thus, the vectors Xi,. . . , Xn depend on n 
as are their distributions and the dimension p. However, we often suppress 
the dependence on n for simplicity of notation. The limit distribution of 
the PEL ratio statistic depends on the degree of dependence among the 
components of Xi, . . . ,Xn. As stated in Section 1, we can broadly classify 
the dependence structure into two categories: (i) Non-Ergodic (NE) and 
(ii) Ergodic (E). In the NE-case, the dependence among the components of 
each Xi is so strong [cf. condition (C.3) below] that even the law of large 
numbers fails. In this case, we show that under appropriate conditions, the 
PEL ratio statistic has a nondegenerate limit. In contrast, in the E-case, 
the corresponding limit is degenerate, and further centering and scaling are 
needed for nondegenerate limit laws which, in turn, depend the type of 
dependence (SRD or LRD). We begin with the NE-case. 

3.2. Limit distribution for the nonergodic case. We need to introduce 
some notation at this stage. Let Pn{jJ) denote the correlation between 
Xij and Xii, 1 < j, I < p. Let alj = Var(Xij) and D^^ = {x€R: P{Xij = 
x) > 0}, 1 < j < n > 1. Write C to denote a generic constant in (0,oo). 
Also, for any two sequences {on}n>i and {bn}n>i S (0,cxd), write a„ ~ 

if an/bn — )■ 1 as n — )• oo. Let L'^[0, 1] denote the set of all square integrable 
functions on [0, 1] (with respect to the Lebesgue measure on [0, 1]), equipped 
with the inner product {f,g) = fg, f,g^ L'^iO, !]• Let {4>k - k ^N} denote 
a complete orthonormal basis of L^[0,1], where N = {1,2,...} denotes the 
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set of all positive integers. For any (bounded) jointly measurable function 
h : [0, 1]^ — M, define the operator Th on L^[0, 1] by Thf = h{-;u)f{u) du 

for / G L^[0, 1]. For x, y € M, let x A y = min{x, y}. 

We shall make use of the following conditions for deriving the limit dis- 
tribution of —logRnin)- The values of the integers r and s below will be 
specified later in the statements of the theorems. 

Conditions. 

(C.l) (i) m&x{E[alj6jY ■.l<j<p} = 0(1) for a given s G N. 

(n) limsup„^^ max{P(X„j- = x) : x G D^j, l<j<p}<l. 
(C.2) (i) For a given r E N, max{E\Xij\'' :l<j<p}<C. 

(ii) A„ = c^n/p for some c* G (0, oo). 
(C.3) There exists a correlation function pq{-, •) of a mean-square continuous 

process on [0, 1], and for each n > 1, there exists a permutation in of 

{!,..., p„,} such that Pn{j^l) = Po(^^^^) ^^^^)- Further, with c* as in 

(C.2), po(")") satisfies the following: 

(i) /o (u, v) du dz; < 1 ; 

(ii) sup{|/3o(?^ + /ii,f + /i2)-po(^i,'y)| : < 6, |/i2| < (^} < g{6)H{u,v) 
for all u,v,u + hi,v + h2 G [0, 1] for some function g(-) satisfy- 
ing g{6) — )• as S ^0 and for some function H{-,-) satisfying 
Efc>i(l<^fc|, |Th(/>A;|) < oo; 

(iii) J2k>i{'Pk,'^o4'k) converges, where Tq = with h = po. 

We now briefly comment on the conditions. Condition (C.l)(i) is a mo- 
ment condition on the scaled component-wise weights Jj's and requires finite- 
ness of the sth negative moment of the sample variance s^^ 's (scaled by the 
respective expected values cr^j's). This condition holds in the case of Gaus- 
sian Xjj's whenever the the sample size n> 2s. Condition (C.l)(ii) is a mild 
condition — it says that none of the Xij 's take a single value with probability 
approaching one. This condition trivially holds if the components of Xi are 
continuous and also in the discrete case, if the supports of Xij^s contain at 
least two values with asymptotically nonvanishing probabilities. Condition 
(C.2)(i) is a moment condition that will be used with different values of r in 
the main theorems of this section, while (C.2)(ii) specifies the growth rate of 
the penalty parameter for a nondegenerate limit of —\ogRn{p^o)- However, 
unlike the standard usage of the penalty parameter in the context of variable 
selection, where different choices of the parameter lead to different sets of 
variables being chosen, here the key role of the penalty parameter is to sta- 
bilize the contribution from the sum of component- wise squared differences 
to the overall "likelihood" in (2.1). Finally, consider condition (C.3) that 
specifies the nonergodic structure of the Xj's. Note that, up to a (possibly 
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unknown) permutation of the co-ordinates, the components of Xi are essen- 
tially correlated as strongly as the variables VF(t)'s coming from a constant 
mean, mean-square continuous process {W{t):t^ [0)1]} (s^-y) with covari- 
ance function po{-). In this case, the dependence among the variables W{i/p) 
and W{j/p), 1 < i < j < p is so strong that the average Sj=i ^U/p) 
may not converge to a constant as p— t- cxo, as one would expect from the 
well-known ergodic theorems. 

Under conditions (C.1)-(C.3), the limit distribution of the log-PEL ratio 
statistic is given by a stochastic integral, as shown by the following result. 

Theorem 3.1. Let conditions (C.l), (C.2) and (C.3) hold with s = 4 
and r = 8, let fiQ denote the true value of n and let oo as oo. Then 

(3.1) -logRn{fio)^'^c^ [ [ Ao{u,v)Z{u)Z{v)dudv, 

Jo Jo 

where Z{-) is a zero mean Gaussian process on [0, 1] with covariance func- 
tion poi') and where the function Ao(-, •) is defined as 

oo 

(3.2) Ao{u,v) = Y,i-'^)''ctpf\u,v), 0<u,v<l, 

k=0 

with Pq^'^\u,v) = 1, Pq^^\u,v) = pq{u,v) and for k>l, 
pI^''^^\u,v) = J ■ J i^Y{po{uj,Uj+i)^po{ui,u)po{uk,v)dui--- duk. 

For a general definition of a stochastic integral of the form (3.1), see 
Cramer and Leadbetter (1967). Note that under condition (C.3), by repeated 
application of the Cauchy-Schwarz inequality, for A: > 1, 

sup |po^^'+^^(u,?;)|<(l-(^)'=-i[2c,]-('=+^) for some 5 G (0,1), 

u,ve[o,i] 

and hence, the limiting stochastic integral is well defined. 

Theorem 3.1 shows that under a suitable choice of the penalty parameter, 
namely, A = c*n/p, the negative log PEL ratio statistic has a nondegenerate 
limit distribution. Note that, unlike the standard version of the EL, we do 
not use the multiple 2 before — logi2„(/io). This is a direct artifact of the 
additional penalty term that we use in the formulation of PEL. Also, unlike 
most high-dimensional problems where the validity of a large sample infer- 
ence procedure breaks down beyond a certain (often exponential) rate of 
growth of p, the PEL and the associated limit distribution of — logi?„(/Uo) 
in the NE-case remains valid for arbitrary rate of growth of p as a function 
of the sample size. Thus, in the NE-case, it is possible to carry out simulta- 
neous hypothesis testing for a very large number of parameters even with a 
moderately large sample. 
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3.3. Limit distribution for the ergodic case. In the E-case, we shall make 
use of the following conditions, for a £ (0,oo): 

(C.4)q There exists a covariance function Pa{') on Z = {0, ±1, ±2, . . .}, 
and for each n > 1, there exists a (possibly unknown) permutation of 
{!,... ,pn} such that Pa{k) ~ C|A;|~" as /c — )• oo and 

Pn{j,l) 



sup 

l<j,l<p 



1 



o(l) as n — )• oo. 



Pa{Ln{j) - tn(0) 

(C.5)q, There exists a constant C > such that 

Qnik) < Ck-"" for all /c > 1, n > C, 

where Qn{') denotes the ^-mixing coefficient of the variables {Xij : 1 < j < p}, 
defined by Qn{k) = sup{|P(An - P{A)P{B)\/ ,J P{A)P{B) ■.AeFY'^Be 
•^m+fc' 1 < ""T- < P — Here, denotes the d-field generated by {Xij : a < 
J < b}, 1 < a < 6 < p, Xij = Xi^(j), and r = is the inverse of the permu- 
tation in in (C.4)a. 

Condition (C.4)q says that up to a (possibly unknown) permutation of the 
coordinates, the components of the Xj-vectors have a dependence structure 
that is asymptotically similar to the one given by Pa{')- Note that the sum 
^'kLo\Paik)\ diverges if and only if a < 1, and therefore, we classify the 
dependence structure of the Xij's as LRD or SRD according to a < 1 or 
a > 1, respectively; cf. Beran (1994). Condition (C.5)q is a decay condition 
on the £)-mixing coefficient of the reordered variables {Xij : 1 < j < p}. Note 
that by (C.4)q,, the correlation coefficient between Xij and Xik is pa{j — 
fc)(l + 0(1)), and therefore, the re-ordered sequence {Xij :1< j <p} behaves 
approximately like a stationary time series with the natural time- index j. 
Thus, condition {C.5)a specifies the degree of dependence of the Xjj's, up 
to a permutation that need not be known to the user. 

3.3.1. Results under short-range dependence. The following result shows 
that the log-PEL ratio statistic in the SRD case converges to a Normal limit 
after a suitable centering and after scaling by the "standard factor" p^^"^. 

Theorem 3.2. Let conditions (C.l), (C.2) and (C.4)q,, (C.5)q, hold for 
some a> 1, s > 6, r > 12. Let = 2c1 X^^^q Po(^)^- Then, for p = o(n^), 

(3.3) p^/^[-logRn{po) - c*] -^''N{0,^). 

We now comment on Theorem 3.2. From the proof, it follows that the 
distribution of the log-PEL ratio statistic, for a given sample size, is close 
to the sum of p weakly dependent chi-squared random variables with one 
degree of freedom. As a result, centering at 1 and scaling by p^/^ yields a 
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nondegenerate Normal limit. The effect of the weak dependence shows up 
in the variance of the limiting Normal distribution, which depends on the 
correlation structure of the components of Xi . It is worth noting that in the 
SRD case, one can use Normal critical points with an estimated variance to 
calibrate simultaneous tests of p hypotheses using the EL. 

Theorem 3.2 extends existing results on the EL in more than one direc- 
tion. Hjort, McKeague and Van Keilegom (2009) and Chen et al. (2009) 
proved a version of the result (i.e., a Normal limit) for the standard log-EL 
ratio statistic in increasing dimensions with centering at 1 and scaling by 
p^/^. In comparison, Theorem 3.2 relaxes the restriction on the dimension p 
of the parameter fi, by allowing it to grow faster than the sample size. This 
should be compared with the best available rate of p = o(n^/^), obtained by 
Chen et al. (2009). Further, Theorem 3.2 covers a wide range of dependence 
structures of the components of ^i/s which are not covered by the earlier 
results (e.g., here the minimum eigen- value of the covariance matrix of Xi 
need not be bounded away from zero). However, the most important impli- 
cation of Theorem 3.2 is that under SRD, the penalization step circumvents 
the limitation of the standard EL which is known to break down beyond the 
threshold p < n/2, as shown by Tsao (2004). 

3.3.2. Results under long-range dependence. For a G (0,1], the sum 
^k^iPo{k) fails to converge absolutely, and we refer to this as the LRD 
case. Sums of LRD random variables are known to have either a Normal 
or a non-Normal limit, depending on the value of a. The next result deals 
with the case where a can be very small, and the limit law is non-Normal. 
Further, the scaling also depends on the correlation decay parameter a, as 
shown by Theorem 3.3. Let r(a) = f^t°'~^e~''dt and i = 

Theorem 3.3. Let conditions (C.l), (C.2) and {CA)a, (C.5)a hold for 
some a E (0, ^), s > 6 and r > 12. If p = o{n^^°'), then 

(3.4) p'^[-logRn{fio)-c.]^''W, 

where W is defined in terms of a bivariate Wiener-Ito integral with respect 
to the random spectral measure ^ of the Gaussian white noise process as 

W = c42Tiar' [ exp(^[^i+^2])-l ^^^ ^ 

J l[xi+X2\ 

Theorem 3.3 shows that under very strong dependence (i.e., for small 
values of a) in the E-case, the log-PEL ratio statistic, with the same cen- 
tering but a different scaling factor, has a nondegenerate limit distribution 
and the limit law is non-Normal. Further, the range p for which the result 
holds is p = o(n^/°), which is a decreasing function of a. Thus, the stronger 
the dependence among the co-ordinates of ^ij's, the larger is the allowable 
growth rate of p as a function of n for the validity of the limit distribution. 
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Theorems 3.2 and 3.3 exhaust the types of hmit laws for the log-PEL 
ratio statistic in the E-case. However, in terms of the rate of decay of the 
correlation function, these leave out the case where a € [1/2,1]. Although 
a € [1/2,1] corresponds to LRD in the traditional sense, the centered and 
scaled versions of the log-PEL ratio statistic continue to have a Normal limit 
as shown by the following result. Curiously, the scaling sequence as well as 
the growth rate of p depend on whether q = 1/2 or a G (1/2, 1]. 

Theorem 3.4. Let conditions (C.l), (C.2) and (C.4)q,, (C.5)q, hold for 
some 1/2 < a < 1, s > 6,r > 12. 

(i) // 1/2 < a < 1 andp = o{n^), then (3.3) holds. 

(ii) Ifa=l/2 and p = o{[n/ log n]'^), then 

(3.5) [p \ogp\ [- log (/io) - c.] ^'^ (0, c^) . 

Thus, it follows from Theorem 3.4 that the log-PEL ratio statistic is 
asymptotically Normal for all a > 1/2, although the components of Aj's 
have LRD when a € [1/2, 1]. The peculiar behavior of the scaling sequence 
at the boundary value a = 1/2 is essentially determined by the growth rate 
of the series Yl'j=iPa{j) ^ p^oo, which is asymptotically equivalent to 
logp for Q = 1/2 but it is bounded for a > 1/2. 

Remark 3.1. Proofs of Theorems 3.2-3.4 show that for any p — )• oo, 
— logRn{fJ-o) -^p as n — )• oo, that is, the log-PEL ratio statistic has a de- 
generate limit under all sub-cases of the E-case, for arbitrarily large p as a 
function of n. However, for nondegenerate limits, refined approximations to 
the difference [— logi?„(^o) — c*] are needed. Here, we are able to show that 
an approximation of the form 

(3.6) -logi?„(^o)-c* = r„ + £;„ 

holds for all sub-cases of the E-case, where T„ is a centered sum and where 
En is an error term, roughly of the order of Op{n~^). Further, r„ has a 
nondegenerate limit up to a suitable scaling, as a function of p, depending 
on the dependence structure of the Xij^s. The bounds on the growth rate 
of p in the different sub-cases of the E-case are then determined by the 
requirement that the scaled error term be asymptotically negligible. For 
example, in the SRD case, p ' A(0,k2) and hence, p^/'^En -^p if 

and only if p^^'^/n — )■ 0, which is equivalent to the bound p = o{n?). Similar 
considerations lead to the respective upper bounds in the other sub-cases of 
the E-case. 

Remark 3.2. It is worth pointing out that the PEL can be used for con- 
structing "conservative" large sample simultaneous tests of the p hypotheses 
Hq : fi = i^lq for arbitrarily large p in the E-case. Indeed, for p growing faster 
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than the upper bounds given in Theorems 3.2-3.4, a conservative large sam- 
ple simultaneous test oi Hq : fi = fiQ rejects Hq if |c^= + logi?(/.fo)| > logn. 
Note that by (3.6), this test attains the ideal level asymptotically. 

4. A subsampling based calibration. In this section, we describe a non- 
parametric calibration method based on subsampling to approximate the 
quantiles of the nondegenerate limit laws in both E- and NE-cases, which 
typically involve unknown population parameters and hence, cannot be used 
directly in practice. Let = {Xi -.i £ 1} be a subset of {Xi, . . . 

where / C {l,...,n} is of size m and where 1 < m < n (specific condi- 
tions on m are given below). On each Af„(/), we employ the PEL method 
and obtain a version of the PEL ratio statistic R^{fj,;I), by replacing n 
with m and Xi , . . . , Xn by Af„ (/) in the definitions i?„ (a*) • First consider 
the NE case. Here, the subsampling estimator of the distribution function 
G^^{-) = P{— logi2„(^o) ^ •) under the null hypothesis Hq : ^ = is given 
by 

Gl^{x) = \Xn\-^ l{-logRl{^io;I)<x), xeR, 

where X„ is a collection of subsets of {l,...,n} of size m and where \A\ 
denotes the size of a set ^. All possible subsets of size m cannot be used 
mainly due to the sheer number of such sets, and hence, only a small fraction 
of these subsets are used to compute G'^^(-) in practice. In view of the 
block resampling methods for time series data, here we shall take X„ to 
be the collection of all overlapping blocks (subsets) of size m contained in 
{1, . . . ,n}. Then, we have the following result in the NE case. 

Theorem 4.1. Suppose that the conditions of Theorem 3.1 hold and 
that 

(4.1) + m / n = o{l) asn^oo. 

Then, sup^gjg \ G^^{x) — G^^{x) \ — )-p as n —)• oo. 

Next consider the E-case. Note that for a = 1/2, the limit of the log- 
PEL ratio statistic is A^(0, 1), which is distribution free. One can carry out a 
simple test [cf. Beran (1994)] to ascertain if "i? : a = 1/2" is true and then use 
the limit distribution directly to conduct the PEL test of the simultaneous 
p hypotheses Hq: fj, = fiQ using the A^(0, 1) critical points, without the need 
for an alternative calibration. As a result, we concentrate on the values of 
a 7^ 1/2 in the E-case. Let ctn be an estimator of the correlation parameter 
a; cf. Remark 4.1 below. Let R^{no',I) denote the PEL ratio statistic based 
on the subsample under fiQ, and define 

V;:,{I) = bn[-logRUl^o;I)-c,], lein, 
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where 6„ =p""^^/^. Then, a subsamphng estimator of the distribution of 
Vn = hn[- logi?,„(/io) -c*] is given by Gn,a{x) = \^n\~^ E/GXn logK* (^) < 
x),x^M. and we have the fohowing results. 

Theorem 4.2. Suppose that there exists a cq G M such that 

(4.2) (logp)[d — q] — )'p Co asn— )'00. 

(i) For a G (l,oo), let the conditions of Theorem 3.2 and for a £ (1/2,1], 
let those of Theorem 3.4{i) hold. If p/rn^ + m/n = o{l), then 

(4.3) sup|G'„^Q,(x) — P(V^ < x)| — )'p asn— >-(X). 

(ii) If the conditions of Theorem 3.3 hold for some a £ {^,1/2) andp'^/m + 
m/n = o{l), then (4.3) holds. 

Theorem 4.2 shows that for both Normal and non-Normal limit laws under 
the E-case, the subsamphng method provides a valid approximation to the 
distribution of the log-PEL ratio statistic. Hence, one can use the quantiles of 
the subsamphng estimators to calibrate simultaneous tests on ^ in a unified 
manner. This is specially important in the case of non-Gaussian limit laws 
for which the quantiles are difficult to derive. However, for a > 1/2, the limit 
distribution is Gaussian, and an alternative approximation can be generated 
by using a Normal distribution with an estimated variance. Indeed, the latter 
may be preferable to subsamphng from the computational point of view. 

Remark 4.1. In practice, the value of a is not known and must be 
estimated. First consider the case where the permutation tn{-) in (C.4)q 
is known. Then, we are essentially dealing with n i.i.d. copies of a time 
series of length p as observations. By using the n replicates of the time 
series, it is easy to modify standard estimators of a based on a single time 
series [cf. Beran (1994)] to construct an estimator d of a satisfying a — a = 
Op{n~^/'^\logp]~^) as n — 7- oo, which clearly satisfies (4.2) with cq = 0. 

Next consider the case where the permutations in(-) are unknown. In this 
case, it is not possible to identify pairs {j,l), l<j,l<p that correspond to 
the lag-A; correlation pa{k). However, it is still possible to construct estima- 
tors of a that satisfy (4.2). Define 

(n ( p 
en + n-^Y.\ - ^"il'^i 
i=i [ j=i 

where en = 0^=1 M-^nj = 0) and 6j (and Xnj and Snj) are as in (2.2). Note 
that a is invariant under permutations of the components of Xj's and also 
under component-wise location and scale transformations. In Section 6, we 
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show that a satisfies (4.2) under the conditions of Theorem 4.2, even when 
;,„(•) is unknown. In the same spirit, we may use the following estimator of 
the limiting variance in Theorem 3.2 in the case where tn(") is unknown: 

p-i 

(4.5) K2^2c2^c(l,j)'l(|c(l,j)l >2n-i/2iogn), 

i=2 

where c(j, k) = n'^ Y17=ii^ij ~ ^nj){Xkn - Xnk)Sj5k, l<j,k<p. Consis- 
tency of K? holds under mild moment conditions; cf. Section 6. 

Remark 4.2. An important factor that impacts the accuracy of the 
subsampling method is the choice of the subsample size m. Note that 

(4.6) m = CM"o/(^+"«) 

satisfies the requirements of Theorem 4.2, where ao = min{a, 1/2}. At this 
point, we do not know the order of the optimal m for the different cases 
considered here. In the next section, we address this through a numerical 
study and explore the effects of different choices of m on the performance 
of the PEL method. 

5. Numerical study. We assess finite sample performance of the PEL 
method by simulation in a variety of settings. We considered different com- 
binations of the sample size n and the dimension p, with p ~ 2n^/^,n/2,2n 
and n = 40 and 200. The testing problem we considered is Hq : /i = 0, al- 
though any other value of may be used in Hq, as the PEL criterion is 
location invariant. We generated i.i.d. random p- vectors Xi, . . . ,Xn where 
the p coordinates of Xj's had one of the three different types of dependence 
structures, namely: (i) non-Ergodic, (ii) LRD and (iii) SRD, as follows. 

5.1. Algorithms for generating the data. 

5.1.1. The nonergodic case. 

(1) Consider the basis functions in L^[0, 1] given by (f)j{t) = sin (27rji) /\/2 
for j = 1, 2, . . . , 15 and (/>j {t) = cos (27r(j - 15)t) /y/2, for j = 16, 2, . . . , 30 and 
Mt) = 1- 

(2) Generate Zj ~ N(0, 1) i.i.d. and let Xj = (exp{l} + 1) for ah j. 

(3) Define = a,-Wij/p),j = 1, 2, . . . ,p, where VF(-) = E -=0 ZjcPj{-)Xj 
and where aj = ajn are scalars in (0,oo). 

Then, Xi = {Xn , X12, . ■ . , Xip} is a nonergodic series. Replicates of Xi yield 
Xi, . . . , Xn in the NE-case. 

5.1.2. Long-range dependence. For the LRD case, we follow a setup sim- 
ilar to that used in Hall, Jing and Lahiri (1998). We generate stationary 
increments of a self-similar process with self-similarity parameter (or Hurst 
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constant) H = l{2-a) e (1/2,1) for a G (0,1). The algorithm is as follows: 

(1) Generate a random sample Zpo = {Zio, . . . , Zpo} from A^(0, 1). 

(2) Define Zp = U'^TipQ, where U is obtained by Cholesky factorization of 
R into R = U'^U and where R = {{rij)) with = pa{\i — and 

(5.1) pa(A;) = i{(A; + l)2^ + (fc-l)2^-2A;2^}, k>l, 

and Pa{0) = 1. Note that pa{k) ~ CA;~"ask — )• oo. 

Replicates of Zp give the variables Xi, . . . ,Xn in the LRD case. 

For the simulation study here, we considered the NE-case (a = 0) where 
the data were generated by the algorithm in Section 5.1.1 and the LRD cases 
a = 0.1 and 0.8 based on the algorithm of Section 5.1.2. For the SRD case 
(a = oo), Xi was generated by an ARMA(2,3) process with A^(0, 1) error 
variables and parameter vector (—0.4, 0.1; 0.3, 0.5, 0.1). 

5.2. Choice of the subsample size. We also considered different choices 
of the subsample size m in order to get some insight into its effects on the 
accuracy of the subsampling calibration. Note that the feasible choices of the 
subsample size depend on the relative growth rates of both n and p as well 
as on the strength of dependence, here quantified by a. For each pair (p, n), 
we considered three choices of the subsample size m (denoted by the generic 
symbols mi, 7712,01.3), depending on the dependence structure. Specifically, 
for the SRD case (a = 00), we set 

m, = c°-Mi/3, i = 1,2,3, 

where = 0.5, C2 = 1 and Cg = 2. Note that in this case, the random vari- 
ables in Xi, . . . ,Xn form a series of length np and are weakly dependent. 
Further, the target parameter for the subsampling method in the SRD case 
(and also in the LRD case with q > 1/2) is the variance of the limiting Nor- 
mal distribution. Hence, in view of the well-known results on optimal block 
length (for variance estimation) [cf. Hall, Horowitz and Jing (1995), Lahiri 
(2003)], the above choices of the m^'s are reasonable. 

Next consider the case a = 0.1 under LRD, where the limit distribution 
is non-Normal. From the proofs of Theorems 3.2 and 4.2, it follows that the 
prescription for m in (4.6) attempts to balance the bias of the subsampling 
approximation to the limit distribution and its variance. However, for a very 
small value of a, a direct application of (4.6) leads to a very small fractional 
exponent of np, which may be too small in practice. In such situations, 
particularly where p is not very large and the LRD exponent a is small, we 
use the threshold n^^'^ and set 

rrn = mi{a) = c° • max{M"/(i+°) , n^/^}, i = l,2, 3, 

where c-''s are as before. The rationale behind this modification is that for p 
small, we simply treat Xi , Xn as a weakly dependent multivariate time 
series and again employ the known results on the optimal block size. 
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Table 1 

Empirical levels of significance a for the subsampling based PEL, with sample size 
n = 200 at 0.05 significance level and c* = 1. Here we have reported |0.05 — a\ 



a 




Pi =20 






P2 = 100 






Pa = 400 




mi 


1712 


ma 


mi 


m2 


ma 


mi 


m2 


ma 


0.0 


0.0092 


0.0063 


0.0190 


0.0090 


0.0114 


0.0162 


0.0089 


0.0103 


0.0091 


0.1 


0.0099 


0.0081 


0.0171 


0.0045 


0.0069 


0.0129 


0.0149 


0.0134 


0.0094 


0.8 


0.0079 


0.0031 


0.0061 


0.0086 


0.0042 


0.0081 


0.0101 


0.0099 


0.0190 


CXJ 


0.0059 


0.0091 


0.0010 


0.0020 


0.0104 


0.0039 


0.0091 


0.0159 


0.0078 




Finally, 


consider the NE- 


■case, a 


= 0. Note that 


for a = 


: 0, p can 


grow 



at an arbitrary rate with the sample size n for the validity of Theorems 3.1 
and 4.1. Hence, in this case, our choice of m depends only on the sample size. 
We consider the "canonical" choice nii = n^^^ as well as the larger values 
1712 = n^^"^ and = In}!'^ to explore the effects of a larger subsample size 
on the accuracy of the subsampling calibration method. 

5.3. Results. 

5.3.1. Levels of significance in simultaneous tests. Here we consider finite 
sample accuracy of the proposed PEL method for simultaneous testing of 
the p hypotheses 

(5.2) Ho:n = vs. /x 7^ 

at the levels of significance = 0.1, 0.05. The correlation parameter a for 
the subsampling based calibration was estimated by averaging the Taqqu, 
Teverovsky and Willinger (1995) estimator of the Hurst parameter (H) from 
each of the p-time series and by setting d = (2 — 2H). Further, we have used 
the interior-point method [cf. Wright (1997)] as a fast optimization tool 
for computing the PEL ratio statistic, which can handle high-dimensional 
optimization problems efficiently. Tables 1 and 2 report the attained levels 



Table 2 

Empirical levels of significance a for the subsampling based PEL, with sample size 
n = 200 at 0.1 significance level and c* = 1. Here we have reported |0.1 — a\ 



a 




Pi = 20 






P2 = 100 






Pa = 400 




mi 


m2 


ma 


mi 


m2 


ma 


mi 


m2 


ma 


0.0 


0.170 


0.020 


0.075 


0.221 


0.142 


0.090 


0.075 


0.152 


0.227 


0.1 


0.030 


0.033 


0.033 


0.012 


0.005 


0.011 


0.112 


0.133 


0.066 


0.8 


0.011 


0.045 


0.087 


0.123 


0.082 


0.018 


0.108 


0.027 


0.069 


CXD 


0.138 


0.135 


0.065 


0.050 


0.011 


0.003 


0.048 


0.054 


0.026 
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of significance based on 500 simulation runs and n = 200 for tlie target 
significance levels of 0.05 and 0.10, respectively, for different values of p, m, 
and a. 

From the tables, it follows that the PEL does a reasonable job of simul- 
taneous testing of p hypotheses for all 4 cases of dependence, for appropri- 
ately chosen subsample size. Comparing the attained level of significance, it 
is clear that the best choice of the subsample size critically depends on the 
relative sizes of n and p, and more importantly, on the type of dependence 
among the components of Xi. Further, rather surprisingly, the PEL tests at 
the level of significance 0.05 turned out to be more accurate (on an absolute 
scale) than at the level 0.1, for the subsample sizes considered here. 

We also considered the effect of the penalty parameter A„ = c*n/p on 
the performance of the PEL test. In the supplementary material Lahiri and 
Mukhopadhyay (2012) (hereafter referred to as [LM]), we report the empir- 
ical levels of significance of the PEL test for n = 200 and the target level 
0.1 for two other choices of the constant c*, namely, c* = 0.5 and c^, = 2.0. 
The results for the choice c^, = 2 are qualitatively similar to those reported 
in Table 2 (with = 1); in comparison, the accuracy for the case c* = 0.5 
appears to be slightly better than the c* = 1 case. A similar pattern was ob- 
served for the 0.05 level of significance. We also considered the accuracy of 
the empirical significance levels of the PEL tests at a relatively smaller sam- 
ple size re = 40, for c* = 1 and a = 0.1; cf. [LM]. The PEL has a reasonable 
performance even at this low sample size; see [LM] for details. 

5.3.2. Finite sample power properties. To get some idea about the power 
properties of the PEL tests, we computed the probability of Type II error 
for a level 0.1 PEL test with n = 200 and = I under the alternative fJ' = fJ^i 
where the first p/2 components of /ii were equal to 1 and the rest were 0. 
Table 3 gives the power of the PEL test at level 0.1 under fi = fii for different 
combinations of p, a and m. From Table 3, it appears that the power can 
be reasonably high for a suitable choice of the subsample size, although the 
maximum value critically depends on the dimension p of the parameters 
and the strength of dependence a. In particular, the PEL attains a higher 



Table 3 



Power 


of the proposed PEL, with 


sample 


size n — 200 


at 0.1 


significance level and 


c* = 1 


a 




Pi = 20 






P2 = 100 






P3 = 400 




mi 


1712 


ma 


mi 


m2 


ms 


mi 


m2 


ma 


0.0 


0.569 


0.681 


0.929 


0.515 


0.643 


0.791 


0.87 


0.834 


0.766 


0.1 


0.66 


0.71 


0.85 


0.569 


0.903 


0.676 


0.794 


0.8 


0.868 


0.8 


0.515 


0.883 


0.688 


0.75 


0.997 


0.488 


0.70 


0.87 


0.90 


CXD 


0.510 


0.622 


0.870 


0.739 


0.778 


0.996 


0.802 


0.790 


0.939 
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Table 4 

Comparison of the subsampling (SS) and Normal (G) calibrations for n = AQ 





Pi 


= 7 


P2 


= 20 


P3 


= 80 


a. 


G 


SS 


G 


SS 


G 


SS 


0.8 


0.122 


0.151 


0.132 


0.080 


0.136 


0.081 


CXD 


0.098 


0.092 


0.030 


0.111 


0.076 


0.093 



(maximum) power under weaker dependence (a = 0.8, oo) than under strong 
dependence [a = 0, 0.1). 

5.3.3. Comparison with Normal calibration. Note that for a > 1/2, the 
limit distribution of the logarithm of the PEL ratio statistic is Normal and 
therefore, one can use the hmiting Normal distribution with an estimated 
variance to conduct the PEL test. In this section, we compare the perfor- 
mance of the subsampling-based calibration with the Normal distribution- 
based calibration. To estimate the asymptotic variance hc^ = 2c^ YlT=o Poi^)'^^ 
we first estimate po{k) using the sample auto-covariance at lag- A; based on 
the components of individual Xj's and then average them to get an esti- 
mate Pn{k) of poik) for k = l,...,K where K = min{p/2,p^^'^}. Since 
involves the squares of po{k), the plug-in estimator is positive (with proba- 
bility 1). Tables 4 and 5 compare the best performance of the subsampling 
based PEL with the Normal, calibration-based PEL for n = 40 and n = 200, 
respectively. 

Tables 4 and 5 show that, except for the small values of p, the subsampling- 
based PEL method has a better accuracy (marked as bold) than the Normal, 
calibration-based PEL. However, the computational burden associated with 
the subsampling method is typically larger than the Normal-based PEL. 

6. Proofs. Note that for each n > 1, the PEL likelihood function in (2.1) 
is invariant with respect to (i) component-wise scaling and (ii) permuta- 
tion of the p components. Hence, all through this section, without loss of 
generality (w.l.g.), we set the component-wise variance o"^ - = 1 and set the 
permutation t„(j) = j for all 1 < j <p and n > 1. Let C, C(-) denote generic 



Table 5 

Comparison of the subsampling (SS) and Normal (G) calibrations for n — 200 





Pi 


= 7 


P2 


= 20 


P3 


= 80 


a 


G 


SS 


G 


SS 


G 


SS 


0.8 


0.150 


0.113 


0.141 


0.082 


0.174 


0.127 


CXD 


0.111 


0.165 


0.048 


0.103 


0.238 


0.126 
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constants that depend only on their arguments (if any), but not on n. Unless 
otherwise specified, dependence on (limiting) population quantities [such as 
Po{-), mixing coefficients, etc.] are dropped to simplify notation, and limits 
in all order symbols are taken by letting n — )• oo. For x G M, let \_x\ denote 
the largest integer not exceeding x and let = max{x,0}. 

6.1. Limit distribution in the nonergodic case. 

Lemma 6.1. For each n>l, let {Yij = Yjj„, 1 < j < p}, i = 1, 2, . . . , n be 
a collection of p = pn- dimensional random vectors with EYij = and EY"]^- = 

al^ E (0,oo). Let 6j = 6nj = s'^lisnj / 0) where s\^ = n'^ EHil^u " ^in? 



and Ynj = n 



Also, let Zi- 



W,ij,l) = Y.jYii - EYijYu, Wn{j,l) 



n 



~^E7=lW^{j,l) and Djr. = {ye 



\Y--\ 



P{Yij =y) >0}. Suppose that (L.l) msix{E{alj6j)'^ : 1 < j < p} = 0(1); 



(L.2) max{E\Yijf:l<j<p}<C, and (L.3) limsup„_ 
y)-y^Djn,j = l,...,p}<l. Then: 

(a) Fork = 1,2, 3, Y^^i ^i^fn = Op{n-^p) • 

(b) Y.U5AZ]n + Wl] = 0,{n-'p). 



max{P(Y'] 



(d) For r = l,2, maxi<j<„ 

(e) EU(^j-l? = Op{n-'p). 



-V) 



o(Mi/4). 



Proof. By replacing Yij^s with Yij/anj for all i,j, w.l.g., we can assume 
that anj = 1 for all j, n. First consider part (a), k = 3; the proofs of A; = 1, 2 
are similar. By repeated use of Holder's inequality, 

P / P \^/^ / P \ 3/4 / p X 1/4 



' V 



3/4 



=1 



=1 



. Y"- 



which is 0(pn-3), by (L.l), (L.2). This proves (a). Parts (b) and (c) follow 
by similar arguments. As for part (d), note that (for r = 2) 

4n 1/4 



E 



max y^5,K-^ 

^ - - J=l 



< 



nE\Y,5,Yl 



< n 



1/4 



AT.'' 



Finally consider part (e). Note that for j 



1 



Its, 




Tl'J 
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i=l 



HSnj = 0) 



so that 



= 6jY^j - 6jWn{j,j) - l{Snj = 0), 
(6.1) = l{Snj / 0) + - d,Wn{j,j). 

Part (e) can now be proved using (L.1)-(L.3) and the Cauchy-Schwarz in- 
equahty. We omit the details to save space. □ 



Lemma 6.2. Under the conditions of Theorem 3.1, 
- logRM = n^n f ^, . . . , ^"j (I, + 27„A„)-i . . . , + o,(l), 

where Ynj = Yli=i^ij' ^ij — -^ij ~ f^j' 1 ^ i ^ Pi 1 ^ ^ (^f^d A„ = 

i{Pn{i- j)))pxp- 

Proof. W.l.g., let cr„j = 1 for all 1 < j < p, 1 < i < n. Note that 
(6.2) - logi?„(^o) =min{/(7ri, . . . ,7r„) : (tti,. . . ,7r„)' G !!„}, 

where /(vri, . . . , 7r„) = - EILi log(^^i) + Ej=i '^^(EiLi ^i^^i)^- Since /(•) 
is strictly convex in tti , . . . , 7r„ over a closed convex set n„ C , it has a 
unique minimizer in n„. [The maximum of /(•) over n„ is +oo.] To find the 
minimizer, we use a Lagrange multiplier r/ and solve the set of equations 

d 

- — 5r(7ri,...,7rn;r/) =0, l<k<n, 

— S-(7ri,...,x„;r/) =0, 

where c/(7ri, . . . , 7r„; 77) = E"=ilog(^^i) 
'7(X]r=i ~ !)• This leads to the equations 

= vr,-! - 2A„ ^ <5,- ( vr.yi,- ) Yfc, + r?, 



■ A„E^=i5.(Er=i^.>^..)' + 

n 

1 < k <n and 1 = vTj , 



i=i 



i=l 



which, in turn, yield the implicit solution 



(6.3) 



r] = 2Xn ^jM^j — n and 



: n< 



1 + 27„ ^ 6jMnjYkj - 27n M^^.-Jj 1 , 1 < A; < n, 
j=i j=i J 
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where 7„ = A„/n and Mnj = X]r=i "^i^ij- To obtain a more explicit approxi- 
mation, we show that vr^'s are of the form tt^ = n~^(l + Op(l)) uniformly in 
k. In view of Brouwer's fixed point theorem [cf. Milnor (1965)], it is enough 
to show that, with = n~^/^ lofffn). 



max 

l<k<n 



(6.4) 



i 1 1 + 27„ X] 5,Mr,jYkj - 27n ^ ^'jSj \ -\ 

I 7 = 1 7=1 J 



Op(n ^a„^) 



whenever max{|7rfc — n ^\:\<k<n] = 0(n ^a„^). To prove (6.4), we first 
show that 

(6.5) supjTn J^5jM2 . : (vri, . . . ,7r„)' G | = Op(a-2), 

where 11° = {(vri, . . . ,7r„)' G n„ : Ivr^ — n~^\ < Ca~^n~^ for all 1 < k < n}. 
Note that for any (^i, . . . , 7r„)' G , ^^=1 ^^ = 1 = Er=i( V^) ^ Er=i(l " 
nTTj) = =^ ~ = Ej("'^j ~ !) + • Also, (nvTj - 1)_|. > if and only 

if (iff) nvTj > 1 and similarly, (1 — n7rj)+ > iff nvTj < 1. Hence, using the 
bound "InTTj — 1| < Ca~^ for all i = 1, . . . , n," we have 



El 

1=1 



nTTj) 



71 



E 

'i=i 



(1 - nVTi 

niTi 



E 



(nvTi - 1) + 



nvr,; 



E(l — nvT; 



(nvFj - 1)_ 



i=l 



(1 - n7ri)+ ^ 1 + (nvTi - 1)+ 

n n 

J^(l - n7r0^[l + 0(a„,)] + ^(nvr, - 1)^[1 + 0(a„)] 



i=l 



i=l 



< 2C^na„^ for large n. 



By (6.3), n-i ELiC^^fc)"' - 1 = 27„ E^=i <5,M„, [K, -M„j]. Hence by Lem- 
ma 6.1 and the Cauchy-Schwarz inequality. 



2C2 

— >27. 
at 



p \ 1/2 



>2 7„^(^,m: 
V j=i 



2 

nj 



I p 

7„^5jM 
^ i=i 



1/2 



nj 



Op(n-V2) 



uniformly in (vri, • • • ^T^n) G H^^, for n large. Consequently, (6.5) holds. Now 
using (6.5), (6.3), the Cauchy-Schwarz inequality and Lemma 6.1, (6.4) 
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follows. Hence, by Brouwer's fixed point theorem [cf. Milnor (1965)], there 
exists a solution {ir^, . . . ,tt^) of (6.3) satisfying the bound 

(6.6) max{|7r|^ - : 1 < A; < n} = Op{n~^a~^). 

Using the second derivative condition, it is easy to verify that (ttj*, . . . , vr^) 
is a local minimizer of /(•). In view of the strict convexity of /(•) on n„, it 
also follows that (tti, . . . ,7rJJ) is the unique minimizer of /(•) over n„. 

Next let Ti, = 2/^7^ E?=i Ef=i Pn(j, /)^i<5zM0 -MO,, where M^^. = 

Y17=i 1 ^ j ^ P- Then, from (6.3), we have 

-logRM^f{7rl...yj 

n ( p P 1 

= ^log l + 27„^,5X,y^j-27n5^5,(M0^.)' 
i=i I j=i j=i ) 



\2 
nj) 



n ( p p 

i=l [ j=l j=l 



In 



P 

\2 
nj) 



p p 

2n7„ - ^j(K,f - Tin + Rm 

p p 

Vav ■ 



p p 

- 2"7n E E 0M°, + R2„ 
j=i 1=1 

= n7„(y„i, . . . , Ynp){lp + 27„A„)~^(Y:„i, . . . , Ynp)' + Rsn, 

where the remainder terms Rfcn's are defined by subtraction. By the next 
lemma, Rfc„ = Op(l) for k = 1,2,3. Hence, Lemma 6.2 is proved. □ 

Lemma 6.3. Under the conditions of Theorem 3.1, Yl'k=i |Rfcnl — '^pi^)- 
Proof. See [LM] for details. □ 

Proof of Theorem 3.1. RecaU that anj = l,Vj,n. We carry out the 
proof in 2 steps. 
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Step (I): Let Z„(t) = I]i=i(\/^^ni)l((i-i)/p,j/p](*)> * ^ (0, 1] and let Z„(0) = 
Z„(0+). Then, P(Z„ G L2[0, 1]) = 1. The first step is to prove that Z„(-) -^'^ 
Z(-) as elements of L?^, 1]. Recall that : j G N} is a complete orthonor- 
mal basis for L^[0, 1]. By Theorem 1.84 of Van der Vaart and Wellner (1996), 
it enough to show that: 

(i) For any < ti < • • • < < 1, 1 < r < oo, 

(6.7) (Z„(ti), . . . , Zn{tr)) -^'^ (Z(ti), . . . , Z{tr)), 

(ii) For any e>0,5>0, there exists N = N{e,S) e'N such that 

(6.8) limsuppf V|(Z„,,^fc)|2>5 I <e. 

'^^"^ \k=N J 

Part (i) can be proved using Theorem 11.1.6 of Athreya and Lahiri (2006); 
we omit the routine details. For part (ii), it is enough to show that 

(6.9) lim limsupE( V|(Z„,</.fc)n =0. 

Let Ij = {^^, |], j = 1, . . . ,p. Then, by Fubini's theorem and (C.3), 

oo p p 



'^^^ '^J^ J:-' J:-' n n 

^Y.\^ZnAk)f = Y.Y.J2 / Mt)Ms)P0{j/p,l/p)dsdt 
k=l k=l j=l 1=1'^ 

(6.10) 

oo „i „i 

/ / (t)k{t)(t>k{s)pQ{s,t)dsdt + o{l), 
Jo JO 



which equals Ao(/)A;) +o(l). Next, using |pr).('i •) I ^ land J \(l)k{t)\ dt < 

(/ ^ki^))^^'^ ~ 1' show that for each fixed k £N, 



j=i 1=1 Jh Jh 



(pk{t)(pk{s)po{j/p,l/p) dsdt 



(6-11) , 

/ (t)k{t)(t)k{s)po{s,t)dsdt = ((/)fc,To(/'fc). 
Jo 

By (6.10), (6.11) and (C.3), (6.8) follows. Thus, Z„ Z on L2[0,1]. 
Step (II) : Next we establish weak convergence of the quadratic form: 

oo 

(6.12) np-iY;(Ip + 27„A„)-iY„ = J^np-iY;(-27„A„)'^Y„. 

fc=0 

Note that by condition (C.3), \\lnK\? < llTJ'j=iTJ'i=iPo{3 /p,l Ip) ^ 
c1 Jq pQ{x,y) dxdy G (0, 1/4). Hence there exists a nonrandom G (0, 1) 
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such that for any fixed m G N and for n sufficiently large (not depending 
on m), 

oo 

|np-i(y„i, . . . ,y„p)(-27„A„)^(F„i, . . . ,y„p)'| 

k=m 

(6.13) 

oo p oo 

^ Y.P~'T.(^^nj?\\2jnAnf < ||Z„(.)f 6^ 

k=m j=l k=m 

Next, let an.kiii denote the (j, Z)th element of (7„A„)*^. Note that /3o(-, •) is 
uniformly continuous on [0, 1]^. Hence, using induction and condition (C.3), 
it can be shown that for any A; S N, 

By the continuous mapping theorem, it now follows that for any fixed m, 

m m , p p 

J;-Y;(-27„A„)'=Y„ = ^i^J]J]z„(i/p)Z„(//p)a„,fc(j,0 

k=0 P k=0 P j=l k=l 

(6.14) 



.dy-(_2)fc/ / Z(n)Z(z;)cJ/>*(')(n,^;)dndi;. 



k=0 

Thus, by (6.14), partial sums of the infinite series in (6.12) converge to the 
partial sums of the limit series for any fixed m. By (6.13), the tail of the 
infinite series in (6.12) is negligible. It can be shown (cf. [LM]) that the tail of 
the limit series is also negligible. Since, ||Z„(-)||2 — )-'^ ||Z(-)||2, by (6.12)-(6.14) 
and Lemmas 6.1, 6.3, the theorem is proved. □ 



6.2. Limit distribution for the ergodic case. We prove Theorem 3.2 and 3.3, 
using different arguments than the proof of Theorem 3.1. This is necessitated 
by the fact that we need more accurate bounds on the remainder terms that 
must become negligible after the scaling (e.g., by p"°). We will also use the 
notation Op(-) to denote a bound that holds uniformly over i £ {1, . . . ,n} 
as n — )■ oo. For example, Aj„ = Op{a^^) means max{|Aj„| : 1 < i < n} = 
Op{a~^) an n^oo. Similarly, define Op(-). 

Proof of Theorem 3.2. For 1 < i < n, let Ai„ = 27„ Yfj=i ^j^nj^ij, 
Dn = 2jnEU^j(.Kjf and A^^ = A,„ - D„. Then, tt^ = l/[n(l + A^J]. 
Now using — l/n\ < a~^n~^ for l<i<n, one can show (cf. [LM]) that 

(6.15) \Al\=0;{a-') and \A,^\ = 0;{a-'). 
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Hence, by Taylor's expansion of log(l + x) around x = 0, 

n p 



1=1 



(6.16) 



n 



,i=l 



+ 3-iJ]Ai + 0p(na-^). 



4 = 1 



There exist Enj = n'^ EiLii^ii • O^ia''^)], l<j<p, such that (cf. [LM]) 



(6.17) 



i=l 



i=l 



n 



i=l 



Lin = '^In'n E 6jMnjYnj - -Dn the lead term of - log Rnifi) 

p n n 1 / " \ 

j=l i=l i=l \i=l / 

- J E - E ^'nDn + E A,„ • 0,"(a-^). 



i=l i=l 

Hence, from (6.16), 



i=l 



(6.18) 



p _ ^ / n \ ^ n 

log i?„(^) = n7„ E ^J^ni^nj - 2 E " 6 ^ ' 

1=1 \j=l / i=l 



+ Op(na„^). 



Next, define PI=pYX=i Then, 



if a > 1, 



(6.19) 



/3n~ S C'plogP, if a = l, 

ifO<a<l. 



Note that for any 1 < i < n, FiYnjYij = n ^ Y2k=i ^YkjYij = 1/n. Let Vin = 
7n Y7j=ii^jYnjYij - 1/n) and D^^- = {y : -P(yij = y) > 0}. Then there exists 
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a (5 G (0, 1) such that by using (6.1) and the conditions of Theorem 3.2, one 
gets (cf. [LM]) 



nE 7„J](K„,yi,-l/n) +nE 7„^5,■^;3.yl, 



p 



(6.20) 



+ nE 



|7nJ]5^i^(Sni = 0) 



<Cp-'(3l + 0{n-'). 
Using (6.17), (6.20) and the Cauchy-Schwartz inequahty, one can show that 

p p n 

njn j2 ^j^njYnj = ?^7n 5jY^j - Ain [Vin + 1/n] 

(6.21) 



j=l i=l 

-2\ , r^ r„-l„-l, 



+ Op{a-') + Op{a~'p-'Pn). 
Also, from (6.17), for any 1 < A; < n, we have 

p 



p 



27n ^ ^jYnjYkj Akn + Rln{k) 



1 + ^) 27„ ^nj^i^ + ^2n {k) , 

where Rin{k) and R2nik) are remainder terms satisfying (cf. [LM]) 

n 



fc=i 



Next, using similar arguments and noting that EWn{j,j)Y^, = 0{n ^) and 



Var(W„(j, j)^4) < Cn"-^ for all J, one can show (cf. [LM]) that 



(6.22) J2^^n{V^n + l/n)=l^WnY.[^^^^^^A i^ + M^))^ 

i=i V"- + / j^^i \j=i / 
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P P 
(6.23) n7„ = n7„ ^ ^ + Opin~^ + n~'/^p~^Pn), 

i=i j=i 



n 

n 



(6.24) 



=1 



y-A3 = ^872 



,1=1 \j=i , 



(l+Op(l)) 



= Op(n-i+n-VVVn). 

Using (6.18), (6.21) and (6.22)-(6.24) and the fact that Ey^j-yjj = n"^ and 
WaiiYnjYij) < Cn~^ for all one can conclude (cf. [LM]) 

P _ 2 

(6.25) - logi2„(/io) = n-fnY^Y^, - - + Op(na-^) + Op{a-^ p-^ Pn) ■ 

i=i " 

Set Qn = n^^'^{p/n'^)^^^^ . Then — )■ 00, = o(n^/^) and y/p.n.a~^ = o(l), 
there by making the last two terms in (6.25) o(p~^/^) whenever p = o(n^). 
Now Theorem 3.2 follows by adapting the proof of the CLT for a stationary 
sequence of ^p-mixing random variables to triangular arrays. □ 

Proof of Theorem 3.3. Arguments in the proof of Theorem 3.2 yield 
the asymptotic approximation for — logi?n(/xo) in (6.25) with ~ cp'^~'^ for 
all < a < 1/2. Now choose a„ — )• 00 to satisfy p"[n~-'^ + na~^ + a~-'^p~^/3n] — ^ 
0, for example, = Then it follows that 



p"(- logi?„(^o) - C.) = CP" YiiV^Ynjf - 1}^ + O, 



(!)• 



In the case where Xnj's are Gaussian with Qnihj) = Qa{i — j)-, the leading 
term has the same distribution as Wn = (p"'^^'^")^^'^ Yl^=ii^j ~ where 
{Zj} is a stationary Gaussian process with correlation function Qa{-)- Then 
the result of Taqqu (1975) implies the Wn — ^'^ W, and the theorem follows. 
In the general case when Xnj^s are not Gaussian, the theorem follows by 
using convergence of moments of y/nYn to the moments of iV(0, 1) and a 
variant of the diagram formula; cf. Arcones (1994). □ 

Proof of Theorem 3.4. Similar to the proof of Theorem 3.3; see [LM]. 

□ 

Proof of Theorem 4.1. Note that X„ = {/j : 1 < i < n — m + 1} with 
li = + 1, . . . ,i + m — 1}. Let 

jmAn—m+l 

Uj{x)= Yl U-logRUf^o,h)<x), xeR, 

i={j — l)m+l 
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for 1 < j < M where M = \{n — m + l)/m] is the smahest integer not less 
than (n — m + l)/m. By the independence of Uj{x) and Uj^ki^) for k>2, 
one can show (cf. [LM]) that for each x S M, 

E{G^''{x)-P{-logR*^ifio,I,)<x)f 

(6.26) 

<Cn-^Mm'^ = o{l). 

The next arguments are similar to the proof of the Glivenko-Cantelli theo- 
rem [cf. Theorem 13.3 of Billingsley (1999)] and the continuity of the limit 
distribution of — logi?„(/xo); one can complete the proof; see [LM]. □ 

Proof of Theorem 4.2. Similar to the proof of Theorem 4.1. □ 

Proof of Remark 4.1. Here we outline a proof of consistency of the 
permutation invariant estimators a and k of Remark 4.1. W.l.g, suppose 
that n = and anj = for all j, n. First consider the estimator a in (4.4). 
Let An = {snj / for all j = 1, . . . ,p}, n > 1. Then, by condition (C.l), 
P{An) = 0{p6'^~^) for some 6 e (0,1). On the set An, e„ = 0, and using 
arguments similar to those in the proof of Lemma 6.2, one can show that 

n ( p "j 2 n 

en + E E [^*^- - ^^-^^^y = E ^'p + 

i=l [_ j=l ) i=l 

where Rn = Op{p~'^). Note that EIn ~ (i-^a)(2-a) foi' < a < 1 and Var(/ii) = 
is easy to verify that a satisfies (4.2). 
To prove the consistency of k of (4.4), using moderate deviation inequal- 
ities [cf. Gotze and Hipp (1978)], one can conclude that 

(6.27) P(|c(j, A;) — c(j, k)\ > logn for some I < j,k <p) = o(l). 

Next note that Ck~" < n^^/^logn for all k > n^/^'^^h This implies that 
only 0(nV(2")) 

-many correlation terms contribute to k, with probability 
tending to one. Now using (6.27) for the nonvanishing terms and the fact 
that a > 1/2, one can prove consistency of k. □ 
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SUPPLEMENTARY MATERIAL 

Numerical results and proofs (DOL 10.1214/12-AOS1040SUPP; .pdf). 
Additional simulation results and some details of proofs. 
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